Unit selection using pitch synchronous cross correlation for Japanese concatenative speech synthesis
نویسندگان
چکیده
We describe a corpus-based approach to improving synthesized speech quality and present two useful cost functions for unit selection. One is pitch-synchronous cross correlation for concatenation costs to reduce the noise caused by phase mismatch at concatenation points. The other is a discontinuous cost function for internal and concatenation costs to eliminate unnecessary cost calculation. An evaluation showed that incorporating pitch-synchronous cross correlation cost was better than using a conventional cost function. In addition, an opinion test to assess the naturalness of the synthesized speech indicated that the proposed method was 0.7 points better on a seven-point MOS(Mean of Opinion Score) than the conventional system. This paper also discusses other improvements in the performance of text-to-speech systems. In this session, we will demonstrate our Japanese text-to-speech system.
منابع مشابه
Prosody-based unit selection for Japanese speech synthesis
A corpus-based concatenative speech synthesis system using no signal processing can produce intelligible synthetic speech maintaining original voice characteristics. In such a concatenative system, it is very important to select appropriate waveform segments that are naturally close to the target prosody. But with a limited size database it can sometimes be di cult to realize natural prosody. T...
متن کاملGeneration of Unit Databases for the Upc Text to Speech System
This paper describes a method for the generation of unit databases for concatenative text-to-speech systems. The method comprises the automatic segmentation and pitch synchronous labeling of the units and a selection procedure to extract the best instance per unit from a generic speech corpus. The segmentation is performed by an automatic HMM alignment. The introduction of the demiphone improve...
متن کاملUsing 5 ms segments in concatenative speech synthesis
A concatenative speech synthesis system increases its potential to generate natural speech if the system uses more short speech segments, since the concatenation variation becomes greater. In this paper, we propose the use of very short speech segments (5 ms, one pitch period of 200 Hz pitch) for concatenative speech synthesis. The proposed method is applied to the speech database CMU ARCTIC, a...
متن کاملImproving speech synthesis of CHATR using a perceptual discontinuity function and constraints of prosodic modification
Concatenative synthesis is widely used in TTS to generate synthetic speech with high quality and relatively natural-sounding prosody. Whatever the type of synthesis unit used, (diphone, phoneme, etc.), a large speech database is usually needed to ensure the phonetic and phonemic variation of the units in a rich variety of contexts. In the CHATR synthesis system, unit selection nds the most appr...
متن کاملModification of pitch using DCT in the source domain
In this paper, we propose a novel algorithm for pitch modification. The linear prediction residual is obtained from pitch synchronous frames by inverse filtering the speech signal. Then the Discrete Cosine Transform (DCT) of these residual frames is taken. Based on the desired factor of pitch modification, the dimension of the DCT coefficients of the residual is modified by truncating or zero p...
متن کامل